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A large capacity ATM core switch architecture is disclosed, 
which supports multiple traffic classes and quality-of- 
service (QoS) guarantees. The switch supports both real- 
time trafiSc classes with strict QoS requirements, e.g., CBR 
and VBR, and non-real-time traflSc classes with less strin- 
gent requirements, e.g., ABR and UBR. The architecture 
also accommodates real-time and non-real-time multicast 
flows in an eflScient manner. The switch consists of a 
high-speed core module that interconnects input/output 
modules with large buffers and intelligent scheduling/buffer 
management mechanisms. The scheduling can be imple- 
mented using a novel dynamic rate control, which controls 
internal congestion and achieves fair throughput perfor- 
mance among competing flows at switch bottlenecks. In the 
dynamic rate control scheme, flows are ratc-conlrolled 
according to congestion information observed at bottleneck 
points within the switch. Each switch flow is guaranteed a 
minimum service rate plus a dynamic rate component which 
distributes any unused bandwidth in a fair manner. 
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LARGE CAPACITY, MULTICLASS CORE 
ATM SWITCH ARCHITECTURE 

This applicalion relates to U.S. application Ser. No. 
08/929,820 which is incorporated herein by reference. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The subject invention relates to Asynchronous Transfer 
Mode (ATM) networks and, more specifically, to a large 
capacity, mulliclass core ATM switch capable of efficiently 
serving requests originated from various classes of sources, 
such as those defined by the ATM Forum. 

2. Description of the Related Art 

HistoricaUy, telephony networks and computing networks 
have been developed in diverging directions. To ensure 
effective real Lime communication, telephony TDM net- 
works establish a channel which is maintained for the 
duration of the call. On the other hand, since most data 
transferred on computer networks is not real time data, 
packet switching routes the packets without establishing a 
channel. One problem with the TDM networks is that a 
source may sit idle, unnecessarily occupying an established 
channel. One problem with packet switching is that it 
requires high protocol overhead and is, therefore, not suit- 
able for real time communication. 

Asynchronous Transfer Mode (ATM) technology has 
emerged as the key technology for future communication 
switching and transmission infrastructures. For an informa- 
tive collection of notes on ATM Networks, the reader is 
refened to: Lecture Notes in Compuier Science, Broadband 
Network Teletraffic, James Roberts, Ugo Mocci and Jorma 
Virtamo (Eds.), vol 1155, Springer 1991, ISBN 3-540- 
61815-5. The main strength of ATM network lies in its 
potential for supporting applications with widely different 
traffic characteristics and quality-of-service (QoS) require- 
ments. The goal of ATM networks is to combine and harness 
the advantages of TDM networks and packet switching, 
while ridding of the respective disadvantages of these net- 
works. Thus, ATM switching will be able to provide a single 
network which replaces the TDM and packet switching 
networks. 

The ATM Forum has established various guidelines for 
ATM design, which can be found in the various publications 
of the ATM Forum. However, for the reader's convenience, 
certain relevant guide hues and acronyms are described 
hereinbelow. 

Currently, the ATM Forum has established four main 
classes of traffic, generally divided into real time traffic and 
non-real time traffic. Constant Bit Rate (CBR) is used for 
real time traffic, i.e.. mainly for audio. Admission of a CBR 
call request can be determined by the requested peak rate. 
Variable Bit Rate (VBR) can be used for video transmission; 
however, since it is very bursty, admission of a VBR call 
request needs to account for peak rate, sustainable rale, and 
burst size. Even upon admittance, ii is desirable to condition 
the u-ansmission from such a source, such as by using leaky 
buckets. CBR and VBR are real time tra^ffic classes. 

Available Bit Rate (ABR) and Unspecified Bit Rate 
(UBR) are non-real time traffic, and are mainly used for 
computer communication. Conventionally, ABR traffic is 
controlled using a closed-loop feedback, which accounts for 
about 3% overhead. Generally, the source generates 
Resource Management Cells (RM cells) which propagate 
through the network. As each RM cell passes through a 
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switch, it is updated to indicate the supportable rale, i.e., the 
rate the source should transmit the data (generally called 
explicit rate). These RM cells are fed back to the source so 
that the source may adjust its transmission rate accordingly. 
5 It should be appreciated that such a feedback system has 
substantial delay and, therefore, cannot be used for real time 
traffic. 

Depending on the class of the transmission, the source 
would request the appropriate Quality of Service (QoS). 
Generally, QoS is determined with reference to transmission 
delay, cell loss probability, and cell loss delay variations. As 
noted above, even when a call is admitted, the source's 
transmission may be regulated, for example, by controlling 
the peak rate using leaky buckets. Therefore, in the connec- 
tion set-up, the source would negotiate for the appropriate 
Usage Parameter Control (UPQ values, and indicate the 
QoS desired. Then the Connection Admittance Control 
(CAC) would determine whether the network can support 
the call. 

20 The source would also indicate a destination address. 
Using the destination address, the ATM network would 
establish a Virtual Channel (VC) and provide the source with 
the appropriate VC indicator. The source would then insert 
the VC indicator in each transmitted cell. The channel would 
25 remain constant for the duration of the call, i.e., all cells of 
the call would be routed via the same channel. However, it 
is termed a virtual channel since it may be shared with other 
sources, i.e., there is no one-to-one correspondence between 
a channel and a source. 
30 Generally, the admitted calls would be associated with 
certain buffers in the ATM switch, and a scheduling algo- 
rithm would determine which buffer, i.e., which call, is to be 
served at any given time. The scheduling should preferably 
account for the QoS guaranteed during the call admittance, 
35 and ensure fair sharing of the network resources. It has also 
been advocated that the algorithm be work conserving, i.e., 
that it should not idle if cells are present in a buffer. 

A variety of switch architectures have been proposed for 
ATM networks. A switch may consist of a single stage or 
40 multiple stages of smaller single stage switches. Switches 
can be generally classified according to the location of the 
cell buffers, i.e., input buffered or output buffered. It is 
well-known that output-buffering achieves optimum 
throughput (see, e.g., M. J. Karol, M. G. Hluchyj, and S. P. 
45 Morgan, Input vs. Output Queueing on a Space-Division 
Packet Switch, IEEE Trans. Comm.. Vol 35. pp. 1347-1356, 
December 1987). However, an output-buffered architecture 
requires the output buffers to operate at an access speed of 
N times the line rale, where N is the number of input ports. 
50 The factor of N speedup can be reduced to Lo8 by using the 
so-called "knockout principle" (see, Y. S. Yeh, M. G. 
Hluchyj, and A. S. Acampora, Tfie Knockout Swith: A 
Simple^ Modular Architecture for High-Performance Packet 
Switching, IEEE J. Select. Areas Comm.. Vol. 5, pp. 
55 1274-1283, October 1987). However, unwanted cell loss 
may occur when the switch is stressed with nonuniform 
U-affic patterns. Shared memory switches also require buffers 
with N times speed-up. 

Input-buffered switches do not require any speed-up, but 
60 suffer lower throughput due to head-of-line blocking. That 
is, a cell at the head of the input buffer queue blocks all other 
cells in the buffer until the destination output line is ready to 
accept the head -of- line cell. However, it may be the case that 
other destiuation output lines are ready to accept other cells 
65 which are blocked by the head-of-line cell. This may lead to 
inefficient use of the bandwidth and cause unnecessary 
delays. 
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Current ATM switches have relatively simple scheduling FIG. 2b, depicts the structure of the input and output 

and bufifer noanagement mechanisms with limited support modules configured for a per class output port queing 

for QoS. On the other hand, as ATM technology proliferates according to the preferred embodiment of the present inveo- 

in the wide area network (WAN) carrier market, more tion. 

sophisticated switches are needed in the WAN core which 5 FIG. 2c depicts the structure of the input and output 

can handle much larger volumes of traCBc from a more modules configured for a per class-output line queing 

diverse set of applications. The next generation WAN core according to the preferred embodiment of the present inven- 

switches will have large capacities and the ability to provide tion. 

QoS support for multiple classes of traffic. Therefore, there 3 ^^^-^^^ ^^^^ ^^j^-j^ structure of the input 

is a need for a switch capable of supporting such diverse 10 ^^^^^^ modules according to the preferred embodiment 

of the present invention. 

SUMMARY OF THE INVENTION ^^G. 4 exemplifies a scheduler with minimum rate shap- 
ing. 

ITie present invenlion Provides a large capacity ATM 5 exemplifies a scheduler implementing the inven- 

switch which supports mulUple traffic classes and quabty- scheduline 

of-service (QoS) guarantees. The sviitch supports both real- „^ ^ , . 

time traffic classes with strict QoS requirements, e.g., CBR ^ ^P^^ts a close-loop rate control. 

and VBR, and non-real-time traffic classes with less strin- FIG. 7 is a flow diagram of the algorithm for scheduling 

gent requirements, e,g., ABR and UBR. The architeaure can cells according to timestamps. 

also accommodate real-time and oon-real-time muhicast 20 FIG, 8 is a flow diagram of the algorithm for rescheduling 

flows in an efficient manner. The switch is based on an of cells according to timestamps. 

input-output buffered architecture with a high-speed core pjQ 9-^3 diagram for serving cells from virtual 

switch module which interconnects input/output modules queues 

with large buffets. ControUed class^bascd access is provided ^^ '^^ ^ ^ ^j^j^ ^. ^ ^^^j^^ according to the 

10 the core switch module through inteUigentscheduhng and « ^^^^^ embodiment of the present invention, 

queue management mechanisms. „ . ^ . , r 

^ . T 1 . f ui t- J • * »■ FIG- 11 IS a flow chart for the algonthm for rate compu- 

The switch would preferably be utilized m conjunction . r i-^nr> u ^ 1- 

-1 1.J1- .1-JJ •* ♦ 1 /r^um tation for DRC scheduUng. 

with a new scheduling method, dynamic rate control (DRC), . . ^ 

invented by the present inventors and described in the FIG. 12 is a flow chart for exphcit rate computation for 

related U.S. patent application Ser. No. 08/924,820. The ^BR. 

inventive DRC controls internal congestion and achieves FIG. 13 is a flow chart of the IRR filter for ABR. 

fair throughput performance among competing flows at fig. 14 is a flow chart for the high gain filter for ABR. 

switch botdeoecks. This is achieved via closed-loop control pjQ ^5 ^ flow chart of a low gain filter for ABR. 

using a proporuonal-derivative (PD) controUer at each exempUfies two ceU stream flows loading an 

botdeneck. The DRC scheme guarantees each flow a mim- ^ J^^^ ^^^^^ 

mum service rate plus a dynamic rate component which ^ *^ „ , , „ . ^ . , . 

distributes any unused bandwidth in a fair manner. This FIG. 17 is a graph of the data coUected for a simulation 

forms the basis for an integrated scheme which can provide performed with the modelof FIG. 1«. fepicUng the conver- 

QoS for different traffic classes. ^^^^^ ^RC rates for CBR flow and UBR flow 

In the large switch, the DRC scheduling mechanism ^ FIG. 18 exemplifies three ceU stream flows loading two 

operates in conjunction with intelligent queue management output ports of the core switch, 

mechanisms. The DRC scheduler detects congestion at DETAILED DESCRIPTION OF THE 

bottleneck points in the switch and alleviates the congestion PREFERRED EMBODIMENTS 

in a controlled manner by moving cell qucueing towards the ^ General Stmcture 

input side of the switch, where ceU discard mechanisms such preferred embodiment, the inventive large capacity 

as early packet discard (EPD) and partial packet discard ^^-^^^^ ^ ^-^^^ g^^gc switch which may be used as a 

(PPD) may be applied to individual class queues. Also, cells switching clement in a still larger capacity multistage 

tagged as low priority, i.e., with the cell loss priority (CLP) switch, 'llie inventive large capacity switch may be classi- 

bit set to one, are dropped when a queue exceeds a threshold. ^ input-output buffered switch (cf. R. Fan, H. 

With DRC, the cell discard mechanisms operate more Suzuki, K. Yamada, and N. Matsuura, Expendable ATOM 

effectively, since cells are dropped in a controlled manner Architecture (XATOM) for ATM LANs, in Proc. ICC 

according to the level of switch congestion. .94^ pp pg.^os, May 1994). The goal of input-output 

The inventive large capacity switch described herein buffered architecture is to combine the strengths of input and 

represents a significant advance over current switches both 55 output buffers. 

in aggregate throughput and in support for multiple QoS !□ the preferred embodiment, the output buffers arc small, 

classes. The design has the flexibility and scalabiUty to meet fast buffers which arc part of the core switch module. Cell 

the needs of present and future high performance ATM qucueing occurs primarily at the input modules, in which the 

networks. buffers operate at the line speed. Head-of-line blocking is 

noinc nnci-DiiynnM toc no awimpc ^0 avoided by queueing cells in the input module according to 

BRIEF DESCRIPTION OF THE DRAWINGS destination output port or according lo destination output 

FIG. 1 is a diagram depicting the architecture of the core line. This architecture achieves the throughput of output- 
switch module according to the preferred embodiment of the buffered architectures, without incurring the expense of fast, 
present invention. large output buffers. Moreover, buffering at the input pons 

FIG. 2a, depicts the stmcture of the input and output 65 is more efficient than output buffering. For the same cell loss 

modules configured for a per vc queuing according to the performance, fewer buffers overall are needed when cells are 

preferred embodiment of the present invenlion. queued at the input ports rather than at the output ports. 
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The backbone of the inventive large multiclass switch is oated for real-time trafiSc and the other, NRTi, designated for 

a new high-speed core switch element which provides fast, non-real-time traffic. In the preferred embodiment, each 

simple, classless, and loss- free switchii^. Controlled access output bufifer, RTi or NRTi, can store on the order of two 

to the core switch module is provided through intelligent hundred cells. At the output of each unicast output port, the 

scheduling mechanisms at the input modules (IM). The input 5 output lines for the non-real-time and real-time buffers are 

modules may be arranged to allow a per class queuing or a combined with the corresponding output lines of the multi- 

per virtual channel queuing. cast output port. During each cell time, (at most) one cell is 

An overaU view of the preferred embodiment of the large transmitted from the output of a unicast output port OPi to 
switch is illustrated in FIG. 1. The core switch module, 10, its corresponding output module OMi. With reference to 
in the illustrated embodiment, consists of 16 input ports, IPl lO FIG. 1, the order of priority is as follows: 1) multicast 
to IPl 6, 16 unicast output ports OPI to OPI 6, and one real-time traffic; 2) unicast real-time traffic; 3) multicast 
multicast output port MOP, all connected to a high-speed non -real-time traffic; 4) unicast non-real-time traffic. 
TDM bus 20. In the exemplary embodiment, the input and While the core switch module provides the basic hard- 
output ports of the core module operate at the rate of 2.4 ware to perform unicast and multicast switching, in the 
Gbps and the TDM bus operates at the rate of 40 Gbps. In 15 preferred embodiment most of the intcDigence of the switch 
one cell time, the core switch module can switch (at most) resides in the input modules (IMi) and output modules 
one cell from each input port to any of the output ports. (OMi). Each input/output module is equipped with a 

An input module (IMi) is attached to each input port IPi scheduler, a queue manager, and a large buffer space. Tight 

of the core module. The output line capacity of the input coupling between the cormection admission controller and 

modules IMi is 2.4 Gbps. The input side of the input module 20 the input/output modules ensures that each queue flow meets 

IMi is coupled to input lines ILli-IL16i which may be its quaUty-of-service requirements. 

configured in one of three ways: 1) one 2.4 Gbps line; 2) four The IM is responsible for ATM cell header translation and 

622 Mbps lines; 3) sixteen 155 Mbps lines. In all cases of the buffering of incoming ceUs in queues organized by VC (FIG. 

exemplary embodiment, the aggregate input line capacity to 2a) class and destination core switch output port (OPi) (FIG. 

the input module IMi is 2.4 Gbps. Notably, the input lines 25 2b), or class and destination output line (FIG. 2c). The term 

ILli-IL16i can carry transmission of sources classified in queue flow is used to represent the aggregate trafiSc of all 

different classes with different QoS requirements. connections corresponding to a given queue. The queues are 

The input modules can be arranged according to per generic, with programmable OoS that can be flexibly 

virtual channel (VC) queuing (FIG. 2a), per class queuing assigned to any tra£5c class by the connection admission 

according to output port (FIG. 2^), and per class queuing 30 controller (CAC) (see FIG. 3). The class queues are further 

according to output line (FIG. 2c). Per VC queuing provides classified as real-time or non- real -time. During each cell 

the best throughput, but requires a large number of buffers time, a scheduler in the input module IMi selects one (if any) 

in each input module. Therefore, while per VC queuing is of the queues. From the selected queue, the head-of-line cell 

preferable from the design stand point, it may not be is transmitted over the TDM bus to the destination output 

preferable from the implementation stand point. 35 port OPi. The queue manager allocates cell buffers to queues 

As shown in FIG. 26, in a per class queuing according to and discards cells when buffer thresholds are exceeded. Cell 

output port, each input module IMi comprises a number of queuing in the input module IMi is designed to avoid 

layers/planes LOPl-LOPl 6 corresponding to the number of congestion at an output port in the core switch module, 

output modules, i.e., in the exemplary embodiment 16 Congestion may occur when the sum of the queue flow from 

layers. Each layer includes several buffers IBl-IBk corre- 40 different input modules IMi exceeds the capacity C at an 

sponding to the number, k, of classes sought to be supported. output port OPi. Under such circumstances, the output port 

Each input buffer IBi is guaranteed a serv^ice rate Ril-Rik. OP becomes a bottleneck point. 

Thus, an incoming cell is routed to the proper layer corre- The architecture of the output module is similar to that of 

sponding to the output port destination, and to the appro- the input module. Each output module operates as an intel- 

priate input buffer within the layer depending upon its class. 45 ligent demultiplexer, basically independent of the rest of the 

The DRC rate feedback is provided from the output module switch. In the output module, OMi, cells are queued accord- 

and corresponds to the load on the output ports. ing to class and destination output line. Cell queuing in the 

As shown in FIG. 2c, two sets of layers/planes arc output module OMi results from congestion at an output line 

provided in the input module when the switch is arranged for OLi attached to the output module OMi; i.e., the sum of 

a per class queuing according to output lines. First, the so queue flows to the output line may exceed its capacity. Thus, 

output module is divided into output port planes OP1-OP16 the output line, OLi, is another potential bottleneck point for 

corresponding to the output ports. Then, each output port a queue flow in the switch due to rate mismatch. Internal 

plane is divided into output line planes OLl-OLk corre- switch congestion at the output port OPi and output line OLi 

sponding to the output lines. Finally, each output line plane bottleneck points is controlled by means of intelligenl sched- 

includes a pluraflly of buffers corresponding to the classes, 55 uling and queue management within the input/output mod- 

In this case, two DRC rate feedbacks are provided, one ulcs. 

indicating the load on the output ports and one indicating the FIG. 3 exemplifies with more specificity the arrangement 

load on the output lines. of the input and output modules. The input module 30 is 

Similarly, an output module (OMi) is attached to each depicted as 16 planes corresponding to 16 output ports 

unicast output port OPi. ^Fhe output side of an OM may 60 0P1-0P16. Each of the planes has a plurality (only four 

consist of one, four, or sixteen output lines with an aggregate shown, but many more may be present) of identical buffers 
output line capacity of 2.4 Gbps. The output module is 32. These buffers can be programmed by the CAC 33 to 

divided into output planes corresponding to the output lines. correspond to various classes having respective QoS 

Each output plane includes a plurality of buffers correspond- requirements. In the exemplified input module 30, the four 

ing to the supportable classes. 65 buffers are assigned to CBR, VBR, ABR and UBR, respec- 

Each output port (OPi) of the core switch module 10 is tively. The CAC 33 also provides the time stamps for the 
associated with two small output buffers, one, RTi, dcsig- cells in the buffers. 
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The core switch module 34 includes the TDM bus 35 
which is coDoected to a plurality of buffered output ports i. 
In the example depicted, each output port has two buffers: 
real time buffer Rt and non-real time buffer Nrl. The output 
buffers are connected to the output modules. Each output 5 
module is divided into planes OLl-OLk, corresponding to 
the output lines. These output planes arc also provided with 
a plurality of programmable buffers and a scheduler. 

2. Multicasting 
A feature of the inventive switch is its ability of efficiendy 

support multicasting alongside unicasting. A unicast con- 
nection within the switch originates at an input line to an 
input module IMi and terminates at an output line at an 
output module OMi. A multicast connection originates at an 
input module IMi and may have multiple output line desti- 
nations belonging to one or more output modules OMi. The 
core switch module provides the capability for supporting 
multicasting among multiple output modules OMi. A cell 
which is to be multicast to two or more output modules OMi 
is transmitted from an input module IMi to the multicast 
output port MOP. The multicast cells are distinguished as 20 
either real-time and non-real-time and a single copy of the 
cell is stored in the corresponding buffer in the multicast 
output port MOP. Duplication of the cell occurs at the output 
of the multicast output port MOP, just prior to transmission 
to the output modules OMi. 25 

As shown in FIG. 1, real-time multicast traffic has the 
highest priority at the input to each output module OMi. (In 
FIG. 1, output priority is depicted using the order of the 
arrows pointing to the vertical line which designates the 
input to the output module. The top-most arrow designating 30 
the highest priority.) In a given cell time, if there is a 
multicast cell at the head of the multicast output port 
real-time buffer, the cell will be duplicated and transmitted 
to the output modules OMi which are involved in the 
multicast. Note that no duplication of multicast cells occurs 35 
over the TDM bus of the core switch module. 

Real-time multicast traffic does not suffer any blocking 
because it has the highest priority. On the other hand, 
non-real-time multicast traffic has lower priority than real- 
time multicast and real-time unicast. In a given cell time, a 4Q 
non-real-time multicast cell at an output port OPi will be 
blocked if there is either a real-time multicast ceD at the OPi 
or a unicast real-time cell in any of the unicast OPis 
pertaining to the multicast. 

3. Feedback Control 45 
Feedback control is utilized to ensure efficient operation 

of the switch. Since the output port buffers in the core switch 
module are small, they can quickly overflow. Therefore, two 
basic feedback mechanisms are used to regulate such over- 
flow. The feedback mechanisms are: 50 

1. A closed-loop feedback control that matches the bottle- 
neck rate and keeps utilization high while keeping the 
queues small at the output port buffers. 

2. A threshold-based rate feedback mechanism that is 
activated when the output port buffers of the core 55 
switch module have the potential to overflow in spite of 
the first control mechanism. 

The first control is achieved by dynamic rate control (DRC) 
scheduling in the input modules. The second control is built 
into the core switch module and is regarded as a safety 60 
mechanisih to quickly control short-term congestion at the 
output port OPi bottleneck. The core switch module pro- 
vides a feedback path for broadcasting state information of 
an output port OPi to all input modules IMi during each cell 
time. The lime for the feedback signal to propagate from an 65 
output port to the input modules is a technology -dependent 
quantity which is denoted herein by Xj (cell times) 
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In the preferred embodiment, the scheduler would dis- 
tribute any unused bandwidth to the input modules. 
Consequently, the actual transmission rate from the input 
modules may surpass the guaranteed minimum rate. 
However, under certain circumstances, using all the avail- 
able bandwidth may cause congestion at certain output 
ports. Therefore, in the preferred embodiment feedback 
signals are used to alleviate any such congestion. 

Preferably, there are three thresholds on the output port 
buffers which generate control feedback signals: 1) slop RT, 
2) shape RT, and 3) stop NRT (see, FIG. 1). The stop RT 
threshold indicator is set to one when the real-time buffer fill 
is greater than or equal to the threshold value Th,^; 
otherwise, stop RT is zero. Similarly, stop NRT=1 if the 
non-real-time queue fill is greater than or equal to Th„^. 
The stop threshold value Th^,^ is chosen as the largest value 
such that no buffer overflow occurs under the worst-case 
assumption that all IMs transmit cells to the same output port 
until the stop signal reaches the input modules. The shape 
RT indicator is set to one when the real-time buffer is greater 
than or equal to the threshold value Th^^^<Th^^^^. Table 1 
shows how the control signals for an output port are encoded 
in two bits, B^, Bq and the action to be taken by the input 
modules. 



TABLE 1 



Feedback Control Sienals. 




Threshold Indicator 


Control Bits 


Action TMen 


shape Rt stop RT stop NRT 


Bl 


Bo 


Input Modules 


0 0 0 


0 


0 


Send RT, Stop NRT 


0 0 1 


0 


1 


Send PT, Stop NRT 


10 0 


1 


0 


Shape RX Send 


1 1 1 


1 


1 


NRT 








Stop RT, Stop NRT 



If Stop RT indicator is set to one, the appropriate feedback 
signal will be activated. After cell times, the signal will 
reach all input modules and each input module will throttle 
the flow of cells (both real-time and non-real-time) to the 
corresponding output port. The stop NRT indicator functions 
in an analogous way for non-real-time traffic. By means of 
feedback signals, cell loss at the output port is prevented. 
Note that the stop signal results in input queueing in the 
input module. Without the stop signal, cells cannot queue in 
the input modules; rather, cell loss would occur whenever 
the output port overflows. 

The shape RT indicator provides a means of controlling 
congestion for real-time traffic based on pre-assigned mini- 
mum guaranteed rates for queue flows. When a shape RT 
signal is received at an input module IMi from a given 
output port OPj, all real-time queue flows corresponding to 
the output port OPj are shaped to their minimum guaranteed 
rates. That is, the real time queues scheduled at their 
minimum rate irrespective of the amount of unused band- 
width available for distribution. This action prevents the 
real-time queue fill in the output port OPj from growing 
further, while ensuring the guaranteed minimum throughput 
for the real-time queue flows. Thus, a real-time queue flow 
is served at a rate greater than the minimum guaranteed rate 
when there is no congestion, and at a rale equal to the 
minimum guaranteed rate when there is congestion. As will 
be discussed further below, the DRC scheduler ensures that 
stop signals for real- time queues flows are activated with 
only low probability. 
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3. Traffic Classes and Quality-of-Service Support 

3.1 Real-time Traffic 

Real-time traflGc such as CBR and VBR have strict 
requirements on cell delay, cell loss, and cell delay variation 
(CDV). In conjunction with a connection admission control 5 
(CAC) algorithm, the large switch architecture can provide 
QoS guarantees. The CAC developed in G. Ramamurthy 
and Q. Ren, Multi-Class Connection Admission Control 
Policy for High Speed ATM Switches, in proc. IEEE INFO- 
COM '97 (Kobe, Japan) April 1997, and incorporated herein 
by reference, may be used in the inventive switch. This CAC 
computes the bandwidth required by a real-time queue flow 
to satisfy the QoS requirements of all connections within the 
flow. Thus, the CAC takes into account statistical multiplex- 
ing of connections within a given queue flow (statistical 
multiplexing takes into account that the bandwidth required 
for a collection of streams is less than the sum of the 
individual bandwidths required for each stream). The 
required bandwidth is conaputcd based on the UPC values 
for each connection and a nominal pre-assignment of buffers 
to the flow. However, it should be noted that in the prior art 20 
the calculated minimum rate is used only for CAC purposes 
and is not sent to the scheduler. 

The DRC scheduling mechanism ensures that each queue 
flow receives its minimum guaranteed rate and hence the 
QoS is guaranteed for all connections within the flow. The 25 
minimum rate is guaranteed on a short time-scale because 
real-time traffic has strict priority over non-real-time traffic 
in the core switch element, and shape feedback mechanism 
from the output ports ensures that queue flows receive their 
minimum guaranteed rates even under congestion condi- 30 
tions. That is, under congestion conditions the shape feed- 
back mechanism halts distribution of any unused bandwidth, 
thereby reducing the rates to the minimum guaranteed rates 
to alleviate the congestion while ensuring the minimum 
guaranteed rate. Further, queues forced to operate at mini- 35 
mum rate (in shape mode) have their priority bit set when 
they become eligible for service. 

3.2 Non-real- time Traffic 

The two main traffic classes targeted as non-real-time are 
ABR and UBR. These" classes generally do not have sU"ict 40 
QoS requirements, but they may have minimum throughput 
requirements. The minimum rate for a non -real-time queue 
flow is just the sum of the minimum throughput over all 
connections within the flow. The large switch scheduler is 
able to guarantee the minimum rate for each non-real-time 45 
flow via DRC scheduling. Any unused bandwidth at a switch 
bottleneck is distributed among competing queue flows 
(both real-time and non-real-time). The distribution of 
unused bandwidth depends on the weights tO; assigned to the 
different traffic classes. Preferably, these rates are assigned 50 
dynamically. 

UBR sources are not rate-controlled and can cause loss of 
throughput in conventional switch architectures. With 
dynamic rate control, UBR queues receive their minimum 
rates plus a fair share of the unused bandwidth. ABR sources 55 
are rate-controlled via a closed-loop feedback mechanism. 
At the switch, an explicit rate (ER) value is computed at each 
bottleneck point in the connection flow. In the large switch 
architecture, an ABR ER value is computed at the output 
ports bottleneck and at the output line bottleneck in the 60 
output modules. Various methods may be used to compute 
the ER; however, in the preferred embodiment the ABR ER 
values are computed in a similar manner to the computation 
of the DRC rates. 

4. Dynamic Rate Control 65 

Dynamic rate control (DRC) is the mechanism for cell 
scheduling used in the large capacity switch. For a full 



understanding of DRC, reference should be made to the 
related U.S. appln. Ser. No. 08/924,820, however, for the 
reader's convenience a summary of DRC basic principles, as 
applied to the inventive switch, is provided here inbe low. 
The reader may also wish to review A Kolarov and G. 
Ramamurthy, Design of a Closed Loop Feed Back Control 
for ABR Service, in proc. IEEE INFOCOM '97 (Kobe, 
Japan) i^ril 1997, which describes a feedback control for 
ABR service as applied to an ATM network. It should be 
kept in mind, however, that in the following description the 
feedback is applied over the ATM switch. 

The basic principle is that each class queue is treated like 
a virtual source whose service rate is dynamically adjusted 
to reflect the unused bandwidth available at a bottleneck 
point in the switch. Specifically, each class is serviced at its 
guaranteed minimum rate plus a dynamicaUy adjusted fair 
share of any available unused bandwidth. Scheduling con- 
sists of computing the queue service rate and implementing 
the rate shaping function for all queues. An important feature 
of this approach to scheduling is that all queues are reduced 
to a generic set of queues, and the QoS perceived by the 
class is determined by the bandwidth guaranteed for the 
class. The generic queues are assigned to classes by the 
CAC. 

4.1 Guaranteed Minimum Rate 

In order to provide QoS for a given connection at a switch, 
there must be a mapping between the traffic characteristics 
and the bandwidth resources at a switch. For a given 
connection i, the traffic specification may comprise a set of 
QoS requirements in terms of cell loss probability, delay, 
and/or delay jitter. Rather than implementing an algorithm 
which account for all of the specified requirements, it is 
simpler to map the requirements into a single variable which 
would accounts for all the requirements. In the preferred 
embodiment of DRC, the requirements are mapped onto a 
bandwidth or rate, M such that if connection i receives the 
rate M,-, then its QoS requirements will be met. Preferably, 
M, would be provided by the Connection Admission Control 
(CAC) algorithm. The rale should be approximated to a 
sufficient degree of accuracy so as to incorporate all the QoS 
requirements. 

Once M,- is determined, the scheduler ensures that con- 
nection i receives its minimum rate. This in turn ensures that 
connection i will be guaranteed its QoS. Thus, the scheduler 
is simplified by having to account for only a single variable. 

Consider a concentration point in the network where N 
connections are multiplexed onto a link of capacity C. 
Clearly, we must have 



Z 



Using a simple First-ln First-Out (FIFO) scheduler provides 
no way of guaranteeing that each connection gets its 
assigned share, M^, of the bandwidth. For example, a given 
connection may transmit at a rate higher than its assigned 
share M,* and thereby take away bandwidth from another 
connection. 

A simple way to ensure that no connection uses more 
bandwidth than its assigned share is to Umit the peak rate of 
each connection i to M,-. This can be done, for example, by 
shaping the peak rate of each connection to its assigned 
minimum rate using known methods, such as leaky buckets. 
FIG. 4 shows N queues with each queue i shaped to a 
respective rate M^, iol, . . . , N. The shaped traffic streams 
are then multiplexed and served in FIFO order at a rate equal 
to or below the downstream butter's rate C. 
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Peak rale eaforcemeni ensures that minimum rale guar- 
antees are satisfied for all connections. Under this schedul- 
ing discipline, however, connection i can never use more 
bandwidth than its assigned M,-, even when bandwidth is 
available. For example, if connection i is the only active 
connection sharing the link bandwidth C, it will be limited 
to using bandwidth M^, even though the entire link capacity 
C is available. Moreover, if the minimum rate M, is com- 
puted assuming that statistical multiplexing takes place, the 
QoS of connection i is guaranteed assuming sharing of the 
overall bandwidth since the link capacity may be exceeded 
with small probability. This type of sharing cannot occur if 
the connection peak rates are limited to the values M,-. The 
assigned bandwidth M,-, computed under the assumption of 
statistical multiplexing gain, may be insufBcient to guaran- 
tee QoS when no statistical multiplexing takes place. 

The rate control mechanism employed in the static rate- 
based schedulers may be termed open-loop. A consequence 
of this is that the scheduler is non-work conserving; i.e., a 
cell time on the output link may is be idle even when there 
may be cells to serve in the system. Thus, it is possible that 
available bandwidth can be wasted in these scheduling 
disciplines. The DRC scheduler remedies this problem via a 
closed-loop control mechanism. 

The basic principle of the DRC scheduler is illustrated in 
FIG. 5. As before, each trafSc stream is peak-rate shaped 
before entering a common FIFO queue served at the link rate 
C. However, the shaping rates R,- are dynamically computed 
to reflect the amount of bandwidth available on the Hnk, 
Specifically, connection i is peak-rate shaped to R„ where 

and E is the estimated imused bandwidth at the bottleneck 
point (also referred to herein as the excess rate or excess 
bandwidth), and w,-^0 is an optional weighting factor which 
may be assigned statically or dynamically. Since, E^O, we 
have R,-^M,-. Thus, connection i is guaranteed the minimum 
rate M^, but may transmit at a higher rate when unused 
bandwidth is available. Conversely, during congestion the 
scheduler may drive E to zero, thereby serving the queue 
only at their guaranteed minimum rate until congestion is 
relieved. 

4.2 Closed-loop Rate Control 

FIG. 6 depicts a closed-loop rate control system which 
detects the available bandwidth to be distributed among 
connections in the DRC scheduler. Assume that time is 
discrctized into intervals of length T. Let X/n) denote the 
number of cells generated by connection i in the nth time 
interval. The quantity 0(n) represents the number of cells in 
the second-stage buffer. In the first stage, each connection 
stream is shaped according to the rate 

/iXrt)»min(Afi+»v/(rt), C). 

ITie controller computes E(n) such that 0(n) is kept close to 
the target queue threshold Qq, In equilibrium, the aggregate 
flow rate to the second stage should match the link capacity 
(provided there is sulEcient source flow). 

Formulated in this way, the computation of E(n) becomes 
a control problem. This somewhat resembles the problem of 
computing an explicit rale (ER) for ABR service, as dis- 
closed in the above noted ABR paper by Kolarov and 
Ramamurihy. However, the implementation of the ER 
within the switch is simplified. For the purposes of the ERC 
scheduler, a single controller is sufficient. Since the rale 
control occurs locaUy within the switch, the feedback delay 
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(see FIG. 6) is small relative to the sampling interval T. This 
delay is negligible, in contrast to flow control in ABR, where 
feedback delays must be taken into account. Thus, while the 
control disclosed in the Kolarov and Ramamurtby paper can 

5 be implemented only for non-real-time ABR service, the 
present control can be implemented for real time and non- 
real time services. 

To simpHfy the design of the controller, suppose that there 
is a single source at stage one with infinite backlog (i.e., it 

10 can always fill any available Unk capacity). Let E(n) denote 
the rate computed by the controller at time n. In this case, 
R(n) is also the flow rate from stage one to stage two for the 
single source. Let €(n)=Q(n)-Qo denote the error between 
the queue length at time n and the target queue length Qq. 

15 The general form of the discrete-time PD controller is as 
follows: 

£(«+l)=£(«)-Qoe(«)ai£(«-a)-. . . -a,£(«-u)-PoE(rt)- 

. . -MC«-v), (1) 

'^^ where oc, i-1, . . . u and P., i-1, . . . , v are real-valued 
coefficients. For the DRC scheduler, it is preferable to use a 
simple two-parameter filter: 

£(/.+l>£(/i)-Q(ye(/i)-a,c(rt-l) (2) 

25 

Thus, the controller is simplified, allowing for speed up of 

rate shaping. 

4.3 Overload Control 

The closed-loop controller adjusts the rate E(n) such that 

30 the error €(n)-Q(n)-Qo decreases in absolute value. 
However, the dynamics of the aggregate input traffic R(n) 
may be faster than that of the closed-loop controller. The 
queue length, Q(n), in the second stage may grow to a large 
value before the closed-loop controller can bring it close to 

35 the target value Qq. This is caused by connections which 
transmit at rates significantly larger than their minimum 
guaranteed rates M,-. A large value of Q(n) can adversely 
affect the delay performance of connections which are 
transmitting at rates close to their minimum rates. Since the 

40 response time of the closed-loop controller may be too slow 
to prevent overload at the second stage of the scheduler, in 
the preferred embodiment a separate overload control 
mechanism is provided. 
When the second stage buffer exceeds a certain shape 

45 threshold, a feedback shape signal is transmitted to the DRC 
scheduler. This shape signal causes the scheduler to shape all 
queues at their guaranteed minimum rates, M,-, and stop 
distribution of unused bandwidth. This action provides a 
quick overload control mechanism allowing relief of con- 

50 gestion. Unlike the prior art backpressure signal, the novel 
shape signal has the important property of allowing the 
queues to transmit at their guaranteed minimum rales in the 
presence of congestion. Still, a stop backpressure signal 
which forces alt queues to stop all cell transmission to the 

55 second stage queue may also be used. 

More specifically, in the prior art the simple stop/go 
backpressure control mechanism results in an equalization 
of throughput for all queues; i.e., each queue will achieve a 
throughput of C/N. Moreover, the frequent occurrence of 

60 slop/go signals introduces cell delay variation (CDV) which 
can adversely affect the QoS experienced by real-time 
connections, 'fherefore, in the preferred embodiment two 
signals are used: shape signal and stop signal. The shape 
signal threshold is set lower than the slop signal threshold, 

65 and allows relief of congestion while permitting transmis- 
sion at the guaranteed minimum rate. The stop signal 
threshold is set very high so that, because of the use of shape 
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signal, the stop signal would be activated very rarely. That For Don-real-time connections, Ny, is computed as an esli- 

is, the stop signal would serve as a last resort pressure relief mate of the number of active connections assigned to queue 

valve. (M»1)> since some actual connections may sit idle. The value 

4.4 DRC Scheduling for the Large Switch of w,. is a policy decision made by the multiclass CAC. Thus, 

4.4.1 Single-loop Feedback 5 the rate R^-; consists of a static pari determined by the CAC, 

In the input modules of FIG. 26, cells arc buffered in M,y„ and a dynamic part w.N^y/Ey. The static part provides the 

queues according to traflSc class and destination output port minimum rate guarantee, while the dynamic part allows 

of the core switch module (hereinafter such an arrangement queue flows to utilize any unused bandwidth at the output 

is designated as class/OP queue). Let (i,j,l) denote the queue port bottleneck in a fair manner (determined by the weights 

corresponding to class i and destination output port j in input lo w^) and without overloading the bottleneck, 

module I (when the input module is obvious by context, we DRC scheduling is also applied analogously to the output 

will sometimes use the abbreviated notation (i j) to refer lo module queue schedulers. In the output modules, cells are 

a particular queue within the input module). Each class/OP queued according to U-affic class and output line. Using 

queue is regarded as a virtual source whose flow rate can be similar notation lo that inUoduced earlier for input module 

controlled. The connection admission control (CAQ algo- 15 scheduling, let (ij,l) denote the queue corresponding to class 

rithm provides the switch with a guaranteed minimum i and destination output line j in output module OMj. The 

service rate for each queue. The value of this minimum rate dynamic rate, R^^,, assigned to queue (i,j4) in OMl is 

is determined by the QoS objective of the class, the number determined according to 

of admitted connections and their trafiBc characteristics. Let R-^ -ft^w/j .^ , 

M,y, denote the guaranteed minimum service rate for queue 20 ^ '* ^" 

(ij4). The input module scheduler must ensiu^ that each where My, is the minimum guaranteed rate for queue (ij4)» 

input module queue achieves a throughput greater than or is the number of active connections assigned to queue 

equal to the guaranteed minimum rate. For real-time traffic, j § 1^^^ j qM,. 

the throughput guarantee must be provided on a relatively ^Al Dual-loop Feedback 

fast time-scale. While non-real-time traffic iDust also receive 25 jj, section 4.4. 1, the DRC rate Ey is computed based on the 

the minimum guaranteed throughput, the guarantee can be bottleneck at output port OP^ and used in the computation of 

provided on a slower time-scale. ^^^^ o ^ discussed above. A second bottleneck is at the 

The sum of the minimum rates must not exceed the line ^^^^^^ ^j^^ ^^^^^^ module due to rate mismatch. If an 

capacity at each output port and at each input module; i.e., output line gets overloaded, queues in the output module can 

grow, leading to cell loss and underutilization. This problem 

Myi i c can be alleviated by queueing cells at the input module 

ut according to output line within an output module, as exem- 
plified in FIG. 2c. More precisely, let (i,j,k,l) denote a queue 

f u - J in an input module corresponding to class i, destination 

for each OP 1 and 55 , i - . . . i* i / -^u- * 

^ output module j, desUnation output hue k (within output 

module j), and input module I, The notation (i j Jc) represents 

Y^ii^ifi^C a queue of class i in destination OM^- and output line k. In this 

^•^ scheme, the number of queues in the input module increases 

by a factor of L, where L is the number of output hnes per 

for each IM 1. The rate guarantees can be satisfied by ^ output module. We shall briefly discuss how DRC can 

slaticaUy rate shaping the queue flows, according to the further improve switch performance in this case via the 

minimum rates. However, as explained above, under a static addition of a second feedback loop from the input module to 

scheduling discipline a queue can never transmit at a rate output line bottleneck. 

higher than its assigned minimum rate, even if bandwidth is In the second feedback loop, a rate, E^;^ is computed based 
available at the output port bottleneck. The static discipline on the number of cells queued at output line k in OM^.. That 
is non-work conserving wilh respect to the output port is, represents the free bandwidth available at the bottle- 
bottleneck (i.e., a cell lime at an output port may be left idle neck corresponding to output line k in output module j. The 
even when there is an input module queue with ceUs to send) rates E^-;^ can be conveyed to all IMs and used to compute the 
and there is no statistical multiplexing between queue flows. dynamic rate, R^y^, for queue (i j,k,l) as follows: 
Note that statistical multiplexing does occur between con- Rm^ak&^i min 

nection flows within a queue flow. ^^^^^ ^^^^^^^ Guaranteed minimum rate for queue 

To achieve slaustical multiplexing between queue flows ^ computing the dynamic rate in this way, both the 

U IS preferable to use the dynamic rate cotjtro (DRC) j output line k (in output 

scheduUng. At output port j, which is a potential bott eneck, ^^^^^^ ^ J ^^^^^^^ 

an excess rate E, is computed based on the traffic uti ization ^ ^^^^^ controlling the IM queues are 

at the output port and queue length information at all input ^ feedback loops: the first extending 

module queues correspond mg to destination OP^ The ^^^^^ ^^^^^ extending to a 

method for computing E w.ll be discussed m section 6 ^^^^^^^^ ,4 ^i^hin the output module. 'Iliis keeps the 

below A dynamic rate, R,,, is assigned to queue (ij,l) ^^^^^s in the output module smaU, while maintaining high 

according to: utilization. Thus, most of the ceU queueing is moved to the 

„ „ P input module in a controlled manner by the DRC mecha- 

nism. 

where w, is a pre-assigned weighting factor associated with 5. Design of the Scheduler 

class i and N,y/ represents the number of active connections 65 5.1 Rate -based Scheduling 

associated with queue (ijj). For real-lime connections, N^-^ In each cell time, the IM scheduler determines the next 

is simply the number of connections assigned 10 queue (i,j J). queue from which a cell can be transmitted to its destination 
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OR In the DRC scheme, the scheduling is based on the which may idle for long periods (i.e., UBR and ABR), 

valuesofdynamically computed rates for each queue. Given the number of active >fRT queues is estimated via a 

these dynamic rates, the IM implements a two-stage algo- counting mechanism to be described in Subsection 55. 

rithm to determine the next queue to serve: It should be noted that although in the above described 

ci,„^ v:rh.,i rof^ eh^«;n« 5 pfeferrcd embodiment the timeslamps are assigned per 

Stage one: Virtual rate shapmg ^^^^^ ,^ ^^^^^ p^^^^^;^ ^^^^ ^^^^^ 

Stage two; Service scheduhng assigning timestamps per cell. In such an embodiment, all 

Virtual rate shaping and service scheduling are discussed in cells having timestamps equal or less than the current time 

detail in subsections 5.2 and 53, respectively. CT would be eligible for service. 

We shall identify a queue with the notation (i j), standing Current Time and Wraparound 

for class i (i-0, ... ,7) and destination OPy Q-0 15). In each IM there is a 12-bit counter CT which stores the 

Associated with queue (i,j) are the following quantities: current time, where each time tick is one cell time at 2.4 

TS,,: tiraestamp. This is stored in a 20-bit register with a Gbps. i.e., 175 ns. One cycle is defined as 2'^ cell times, or 

12-bit integer part and an 8-bit fractional part. The ^9'^^ " ""^il CT wraps around, starting from Cr=0. 

timestamp is updated whenever the queue is scheduled Wh«"^.^" ^ wraps around, t^c WF^ag for each queue 

or rescheduled for service. 1^,, is initialized to zero. ^'^^ '^■'^'''''^T^,^^^^^^ 

y done in a smgle cell time. Together, the timestamp TS,y and 
AQ^: acmal queue size. This is the number of cells stored ^ indicate the value of time maintained for queue 

in queue (ij). AQ„ is incremented whenever a cell ^- -^ ^^^^ ^^^^^^ ^^^^ meanings of the 

arrives to queue (i j) and is decremented whenever the ^^^j. possible values of WF are summarized in the Table 2. 
scheduler chooses queue i to be served in a given cell 20 

time. AQ.y is stored in a 16-bit register and is initialized TABLE 2 

to zero. 





\felues for V^y 




Meaning: TSi; is . . . 


0 


oac cycle ahead of CT 


1 


in same cycle as CT 


2 


one cycle behind CT 


3 


at least two cycles behind CT 



yQy: virtual queue size. This is the number of cells stored 

in the virmal queue for queue (i j) VQ,y is stored in an 

8-bit register and is initialized to zero. In the preferred 25 

embodiment, VQ,y is never incremented beyond 255. 
WF,y: wraparound flag. This is a two-bit flag which 

indicates the cycle of the current clock. It is initialized 

to one. 

M,j: minimum guaranteed rate. This quantity is provided ^^j^^ together with WF^- makes it possible to 

by the CAC and is stored as an mterval, IM.y, determine the relative values of the queue timestamp and the 

J^-^: interval for minimum guaranteed rate. current time even when wraparound occurs. 

This is the inverse of the minimum guaranteed rate, M^^, 5.2 Virtual Rate Shaping 

for queue (i,j), stored as a 20-bit number in memory with a Virtual rate shaping is based on a timestamp, TS,y, 

12-bit integer part and an 8-bit fractional part. assigned to each queue (i,j). The timestamp TS,y is updated 

E,: DRC rate. The DRC rate is computed at OP j based on such that the flow rate of queue (i j) is limited to either R^y 

the global virtual queue for OP j. This value is not or M,y. ITie dynamic rate R,y is used in the timestamp 

stored in the IM. computation if no cx)ngestion conditions occur (see below); 

w-: weighting factor for class i. This is 8-bit integer used 40 otherwise the minimum rate M.y is used. If the minimum rate 

'in the computation of the shaping rate R,y. is used in the timestamp computation for queue (i,j), then the 

R,,: computed shaping rate. This is the rate computed by ^'V f , 

rxnr^ i H e u • 4U . «f /: :\ Any queue with a timestamp which is less than or equal 

the DRC algorithm for shapmg the traflBc of queue (i,j). , * *• ,i j 1 -ui n, 

, , . ^ . J • *u f r • ♦ II to the current time is considered ehgible. The current time is 

Its value IS stored m the form of an interval, l,y. . . , • u- u .u • . 1 u^*...^^^ ^i^^u 

45 a free -running clock in which the interval between clock 

I,y: interval for computed rate. This is the mverse of the ^^^^ ^ ^^^^^ ^^^^ ^^^^^^^ ^^^^ li^^^ Ij^ 

rate, R,y, computed by the DRC algonthm for queue scheduler selects the next eligible queue with priority given 

to queues (i j) which have the priority bit P^ySet to one. Once 

P^.: scheduling priority. This is a one-bit flag. P,y-1 ^n eligible queue, say queue (i j), is selected, the virtual 

indicates that queue (i j) has priority for finding the next ^^^^^ counter VQ,y is incremented by one. The virtual queue 

cell to schedule 10 the virtual queue. counter counts the number of cells that are eligible for a 

PV,y: service priority from virtual queue. This is a one-bit queue but have not yet been transmitted. Let AQ,y denote the 

flag. PV^.=1 indicates that queue (ij) has priority for number of ceUs in queue (i,j). If there are still cells in the 

finding the next cell to service (transmit to OP) from the active queue which have not been scheduled in the virtual 
virtual queue. 55 queue, i.e., if AO,y>VQ,y, then the timestamp TS.y is updated 

S,-: shape signal. When this signal is set, queue (i j) is to to reflect when the next cell in the actual queue should be 

'be scheduled at the MCR rate. The signal S,y is set equal scheduled. The timestamp is also updated when a cell arrives 

to one under the following conditions: to an empty queue (ij) 

1. Queue (i j) is of type RT and the shape RT signal is The scheduling algorithm sets the priority bit P,y under 
set to one. 60 conditions: 

2. Queue (i j) is of type RT and local VQ count exceeds 1. The timestamp TS^j has fallen behind the current time 
a threshold: S-yVQ^.^Th^ij. by more than 1/M,y. 

3. ITie product N,yxW,.xEy=0. 2. It is necessary to shape the queue flow at the minimum 
Note that NRT traffic is shaped only in case 3. guaranteed rate, M,y, in order to control congestion at 

N,y: number of active VCs. For RT queues, this number is 65 OPy. 

simply the number of VCs as recorded by the CAC, Case 1 occurs because the local IM scheduler is unable to 

since they are all presumed active. For MRT queues keep up with the trafl5c and the computed shaping rates. 
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Since it takes time for the rate computations to converge to rate. Then, in step S750 queue (i j) is scheduled at the current 

the appropriate rate, it may happen that the instantaneous time CT, i.e., TS,y=CT. 

sum of shaping rales at the IM scheduler may exceed the line However, if (3) is true in step S730, the routine would 

rate C, i.e., proceed to step S760 to check on the value of S,y. The queue 

5 will be scheduled at either the minimum guaranteed rate, 

^ ^ M,y, or the computed rate, R,y, depending on the value of S,y. 

• " ■ If in step S760 S,..«0. i.e.. the OP is not overloaded, then 

queue (ij) is scheduled at the computed rate, R,y, with 
priority P^y-O in step 770. In this case, if the condition 

If this condition occurs, multiple queues may become eli- jo 

gible in the same cell time. CCT<rSty^iij (4) 

In case 2. the queue \s scheduled under shape mode, i.e., ^ ^ ^^^^^ ^^^^ ^ ^^^^ 

the queue ser^ace rate changes from R^- to M,,. Shape mode scheduled at the current time CT. Otherwise, the 

IS mvoked under the followmg conditions: tim^tamp is updated at step S790 as 

1. The queue is real-time and either a shape or stop signal 15 

from the OP is in effect. TS^fTS^+i^j, 

2. The queue is real-time and the local sum of VQ counts ^^^^ 573^^ -^ ^ Qp ^ overloaded, then queue 
for the destination OP exceeds a threshold. j^^^ ^ ^^^^^^ minimum guaranteed rate and 

3. The computed dynamic rate R equals the minimum rate given priority P,y=l (step 775). In this case, in step S795 the 
M. timestamp is updated according to 

The first two conditions indicate the occurrence of con- 
gestion for real-time traEBc at an output port. Switching to TSi=TSif+Jij. 

shape mode quickly alleviates such congestion, while main- wraparound flag WF,y may need to be adjusted 

taining the minimum rate guarantee. Giving pnonty to appropriately in step S755. 

queues operating in the shape mode ensures that their Rescheduling 

minimum rates will be guaranteed. When an eligible queue rescheduling algorithm is shown in FIG. 8. The 

(i,j) is selected by the stage one scheduler, its vutual queue algorithm attempts to serve and (if necessary) to reschedule 

counter VQ,.. IS mcremented by one. If the pnonty bit P.y is ^^^^^^ ^^^^^ ^^^^ ^^^^^ conformance times with 

set to one, then the stage two pnonty bit PV.. is set to one. ^^^^^ ^ ^^^^ ^ ^ this context, to serve queue 

When a cell amves to an empty queue (ij), the queue ^^^^^ increment its virtual queue counter VQ.- by 

becomes eligible for service and the tunestamp TS,^ must be -j^^ algorithm performs a round-robin search of the 

recomputed to reflect when the queue should next be served. ^^^^^^ ^jj^ ^^^^^^ ^^^^ ^^^^^ p . iterating 

This process is caUed scheduling. After the stage one sched- ^^^^ ^^^^^ -^^^ ... ,7 and then the destination OP 

uler chooses a queue (i,j), the virtual queue counter VQ^^ is .^5 ^^^^^ ^ ^^^^ 

incremented by one. If there are still cells m the acUve queue ^ attempts to find a queue (iJ) 

which have not been scheduled m the virtual queue, i.e., if ^^^^ p p^^^-^y ^-^^ - p . [f (^le first pass 

AQ^->VQ.., then the timestamp TS-- should be updated to ^^^^^ ^^^^ ^ ^^^^ executed to find a queue with 

reflect when the next cell in the actual queue should be p^^ rescheduUng algorithm is run until either all 

scheduled. This process is called rescheduling. ^ ^^^^^ ^^^^ examined or a time-out occurs, indicating 

Schedulmg . . « the end of the cunent ceU time. 

Tlie algorithm for schedulmg is depicted m FIG. 7 m flow condition F,,=l is tnie if and only if the following 

chart form. In step S700, when a cell arrives at queue (ij), ^^y^, ^ 

the counter AQ,j is incremented by one (the notation 4-f is ^ AQ,,>VQ-.. This means that there is at least one cell in the 

used to indicate an increment by one). In step S710, it is ^^^^^^ ^^^^ ^^^^^ scheduled in the virtual 

checked whether AQ^y-VQ,;^.-!. If so, then the queue is queue 

eUgible to be scheduled and the process continues to step 2. VQ.,<FF (hex). The counter VQ,^. is an 8-bit counter which 

S720. Otherwise, the routme terramatcs at step 8725. ^^^p^ incrementing when VQ,;,.oFF (hex). Therefore, a 

The vanablcCCTLS a 14-bitmtegervanabl6 which stores maximum of 256 cells can be scheduled in a virtual 

the value of the current time relative to the timestamp TS,y. ^^^^^ ^^j^ u^^j^ reached, then the queue must be 

Note that WF-. is mitiahzed to one. Refemng to Table 2, bypassed for rescheduling; the virtual queue counter 

observe that if AQ,;^.-VQ^-1 then necessarily WF-.^l. In ^^^^^^ ^ incremented beyond 255. Note that if the limit 

step S720 CCT is computed as ^j^^ ^^tual queue counter is set to I, this in effect 

CCT-CT+{WF.~i)«n disables the virtual queue. The scheduler will still perform 

* ' 55 rate shaping, but the rate computation must not be based 

where « denotes the binary shffl left operation (i.e., raul- on the global virtual queue size. 

tipUcation by 2'^). 3. WF,y^2 or (WF,^-1 and 1^J,;,-^CI'). If this condition is 

The next step in the scheduling algorithm is to compare true, then queue (i,j) has passed its conformance lime; i.e., 

CCr with the 13-bit integer TS^^+IM'^". In step S730 if the TC,^ represents a point in time which is earlier than the 

condition 50 current time as recorded in CT 

If an eligible queue (i j) is found in the round-robin loop, the 

CCT<TS^Jij (3) ^^^^ ^Qiion is to increment VQ,^ by one (S820). The virtual 

is false, then queue (i,j) is considered to be mnning late, i.e., q^^ue priority bit is updated according to (S820): 

it is behind the current time by at least one interval at the pv^^mzxiP ^PV). 

minimum guaranteed rate (J^) Hence, the routine proceeds 65 ' " 

to step S740 wherein the priority bit P^. is set to one, since Thus, PV,^ 1 

the queue traffic is conforming to its minimum guaranteed P,y is set. 
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Next, if AQ,^>VQ,y(S830), the queue needs to be resched- 
uled (S840). Otherwise, no rescheduling is necessary 
(S815). In the rescheduling step, a temporary variable, CCT, 
is computed as in the scheduling algorithm (cf. FIG. 7): 

CCT»Cr4< WF^f-l)«12. 

if CCT<TS,-,+7f/ is false, then queue (ij) is considered 
running behind the current time (S850) Therefore, in order 
to catch up with the current time, the queue is scheduled at 
the minimum guaranteed rate, M,y, with the priority bit, P,y, 
set equal to one (steps S865 and S875). Otherwise, if 
CCT<TSij+Ji/ is true, the vahic of S.y is tested in step S860. 
If S^--0, the queue is scheduled at the rate R,y with P,y-0 
(steps S870 and S880). Otherwise, the routine proceeds to 
steps S865 and S875 and the queue is scheduled at the 
minimum guaranteed rate with priority Pifl. 

When TSfj is updated by adding to it either Jfj or I^y, an 
overflow bit, Z^, results. If Z,y=l, the timestamp TS^. has 
advanced to the next cycle, hence, WF^y should be decre- 
mented by one. Otherwise, WF,y remains unchanged. This is 
accomplished in step S890. 

5.3 Service Scheduling 

During each cell time, at most one cell is transmitted from 
an input module onto the core TDM bus to its destination 
output port. As shown in FIG. 9, the queue from which to 
send the cell is determined by a round-robin with priority 
search based on the priority bits PV^y (step S900). In the 
stage two scheduler, a queue is considered eligible for 
service if VQ,y>0 and the destination OP buffer for the queue 
is not in the stop mode. If an eligible queue is found (yes in 
step 8900), the first cell in the queue is transmitted over the 
TDM bus (step S910). Also, io step S910 both VQ,y and AO,.- 
are decremented by one and the virtual queue priority bit, 
PV,y, is reset to zero. Recall that the value of VQ.y indicates 
the number of cells that are ready for servicing. 

5.4 High-level Hardware View 

FIG. 10 shows a high-level view of the hardware for 
carrying out the scheduling operations. The main compo- 
nents are: 

1. Storage for the timestamps TS.y. These can be imple- 
mented in the form of a plurality of registers 100. 

2. Array of comparators 110. The comparator associated 
with queue (i,j) compares TS,y and CT. 

3. Storage for the actual queues AQ,y. This can be imple- 
mented as an array of counters 200. 

4. Storage for the virtual queues VQ,yl30. 

5. Block 135 which performs priority round-robin (PRR) 
with respect to the outputs of the comparators (virtual 
rate shaping). 

6. Block 145 which performs PRR with respect to the 
virtual queues (service scheduling). 

7. Compute engine 150. 

8. "Stop/shape/go** signals from the core switch. 

The PRR 135 for virtual rate shaping uses the priority bit 
Pi for queue (i,j). The virtual rate shaper looks for a queue 
(i,j) with ^rS,yiCT with priority given to queues with Pf,-=>1. 
If AO,y>VQ,y, the virtual queue, VQ,y, is incremented by one. 
The PRR 145 for service scheduling serves the virtual 
queues with VQ,y>0 in round-robin fashion with priority 
given to those queues with PV,y^l. Wtual queue (i,j) is 
eligible for service only if there is no stop signal corre- 
sponding to the destination output port j. 

The compute engine 150 dynamically updates the rates 
according to the DRC scheduling. The rate R is computed 
according to 
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R^^ij^wffi^^ (5) 

based on: 

Information from the CAC: 

The minimum guaranteed rate M^y 

The class weight w^. 

Stop/go/shapc feedback from the core switch module. 

The estimated number of active connections, N^y, associ- 
ated with queue (i,j). 

The excess rate Ey, carried in IRM cells from output 
module j. 

The compute engine also updates the timestamps TS^y 
according to the scheduling and rescheduling algorithms 
described in subsection 5.2. 
5.5 Estimating Number of Active VCs 

The number of active VCs, N.y, for queue (i j) is used in 
the computation of the rate R,y and in the ER values which 
are computed at the output modules. For real-time 
connections, the number of active VCs is taken to be the 
number of VCs which have been accepted by the CAC 
algorithm. For non-real-time connections, such as UBR and 
ABR, the number of VCs accepted by the CAC may be far 
greater than the acmal number of VCs active at a given time. 
Tliis is because non-real-time VCs generally do not pay for 
guaranteed QoS and hence may be idle for long periods of 
time. 

Thus, a method of estimating the number of VCs for 
non-real-time traffic is needed. For the 40 G switch of the 
preferred embodiment, a simple VC table lookup method is 
used. The table consists of a one-bit entry (initialized to 
zero), along with a queue identifier (i,j), for each non-real- 
time VC. Time is divided into intervals of length Ts. When 
a cell belonging to VC k arrives in an interval, if the 
corresponding table entry is a zero, the entry is set and the 
count Ni- is incremented by one. Otherwise, if the table entry 
is already set, no action is taken. At the end of the interval, 
N,y represents an estimate of the number of active VCs over 
the interval. Before the start of the next interval, the counter 
N,y are all cleared. A smoother estimate of the number of 
^ active VCs is obtained by exponential averaging: 

where ee (0,1). 

6. Rate Compulation 

45 6.1 DRC Rate 

6.1.1 Single-loop Feedback 

The general structure of a single feedback loop is depicted 
in FIG. 2h. The rate values Ey (corresponding to output 
module j) and Ey (corresponding to output line j) for DRC 

50 scheduling at the IM and OM, respectively, are computed 
once every 0.5 ms (see section 4). We shall explain how the 
DRC rate E is computed; E is computed in a similar fashion. 
FIG. 11 shows a flow chart diagram for the computation of 
the DRC scheduling rates. In FIG. 11 , E(n) denotes a generic 

55 DRC rate value computed during the nth (0.5 ms) sampling 
interval. The symbol VS(n) denotes the sum of the virtual 
queue sizes corresponding to the bottleneck point. For the 
DRC value E^, VS(n) represents the sum of all virtual queues 
destined to output port j over all input modules. Similariy, 

60 NS(n) denotes the total number of active VCs destined to 
output module j over all input modules. For DRC value Eyj^, 
VS(n) represents the sum of all virtual queues corresponding 
to output module], output line k. In this case, NS(n) denotes 
the number of active VCs in output module j destined for 

65 output line k. 

A closed-loop proportional-derivative controller is used to 
compute E based on observations of the aggregate virtual 
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queue length at the OP bottleneck. When the OP channel 
utilization exceeds a value Uq(-95%, see step SlUO), the 
controller adjusts the value of E so as to naaintain the 
aggregate virtual queue length corresponding to the OP 
bottleneck close to a target value Nq. When the OP channel 5 
utilization lies below \Jq, the controller adjusts E so that 
utilization will be brought close to V^. 

Let C^n) denote a count of the nunaber of cells observed 
at the output of the OP during the nth scanning inten^al. If 
C is the number of cell times during one scanning interval, lo 
the utilization at the nth interval is computed as U(n)=Cr 
(n)/C (step SUOO). Let V(n) denote the sum over all IMs of 
the virtual queue lengths corresponding to the OP during the 
Qth interval. If U(n)>Uo, the error is computed as 

15 

/>(«)- v(/j)-Aro. 

where Nq is the target aggregate virtual queue length. 
Otherwise, the error signal is computed based on a target 
utilization, Co=UoC and the error signal is computed as 20 

During each scanning interval, the bottleneck rate is com- 
puted using the following proportional derivative (PD) con- 
trol equation, which attempts to drive the error to zero (step ^5 
S1140): 

The coefficients Oq and are constants which arc designed 30 
to ensure system stability and fast response time. In simu- 
lation experiments performed by the present inventors, the 
constants were set to: ao=1.25, and ai=-0.75. The condition 
that the rate must be greater than zero, is ensured via the 
operation 35 

£(/<+l>inax{£:C"+l), 0}. 



The rate value must also be limited by the bottleneck line 
rate; i.e., 

£:(«+l>-mui{£(«4l), c}. 

The rate is computed in units of [celis/0.5 ms]. The error 
signals D(n), D(n-l) and the value E(n) are stored as D(n-l) 
for the next rate computation (step S1150). 
6.1.2 Dual- loop Feedback 

Dual-loop feedback requires that cells are queued in the 
input modules according to output line (FIG. 2c). Also, 
counters must be maintained of the number of cells queued 
for each output line. Let AQyj^ denote the number of cells 
queued for output line k in output module j. In this case, a 
DRC rate, Eyjj. (corresponding to output line k in output 
module j, is computed once every 0.5 ms (see section 4.4.2). 
The computation of E^.j^ is similar to the computation of E^ 
discussed in the single-loop case. However, in this case, the 
actual queue size AQ^^^ is used in place of the virtual queue 
count denoted as VS m FIG. 11. The queue size AQj-^y is also 
used in the computation of the ABR explicit rate for output 
line k in output module j as discussed next. 
6.2 ABR Exphcit Rate 
6.2.1 Output Module Bottleneck 

For ABR service, the explicit rate (ER) values are com- 
puted based on the sizes of the actual ABR-class queues. The 
method of ABR rale computation described here somewhat 
resembles the one developed in the above-cited ABR Ser- 
vice article by Kolarov and Ramamurthy, with modifications 
to handle the higher 2.4 Gbps line speed and for switch-wise 



implementation. The ABR rate computation is also per- 
formed once every 0.5 ms. For each destination op, an ER 
value, ERy=l, ... ,16 is computed. 

FIG. 12 gives a flow chart for the computation of the 
explicit rate ER. The flow chart applies to both the output 
module OM bottleneck and the output line bottleneck. 
C^X^) denotes the number of ABR cells which arrive 
during the nth 0.5 ms interval. In step S1200 the utilization 
for ABR during the nth interval is computed as: 

where C is the total number of cell times in the 0.5 ms 
interval at the bottleneck rate. 

Let AS (n) denote the size of the aaual ABR queue 
corresponding to the bottleneck point (output module or 
output line) for the nth interval. That is, AS(n) is the sum of 
the actual queue sizes for all ABR queues destined for the 
given bottleneck. The value AS(n-l) is stored in memory. If 
the difference between AS(n) and AS(n-l) exceeds a thresh- 
old (exemplified in step S1210 as 150 cells), this indicates 
that the ABR queue is growing too quickly and a fast control 
must be used. Therefore, the IRR filter is called in step 
St215. The IRR filler is also called if AS(n) exceeds the 
threshold T^,,-^ (step S1220) or if the flag F=l (step S1230). 

In step 12%, if it is determined that the utilization of ABR 
traflSc is less than the target, the routine proceeds to step 
1250. Otherwise, the routine proceeds to step 1245 and low 
gain filter is applied. In step 1250, is it is determined that the 
sum of actual ABR cells is less than the low threshold, than 
the routine proceeds to step 1255 where a high gain filter is 
applied. Otherwise, the routine reverts to step 1245 and the 
low gain filter is applied. 

FIG. 13 shows the operation of the IRR filter. The IRR 
filter simply sets the ER rate equal to a fraction of the DRC 
local rate E. In this case. 



£:i?(«+l)=£(n)/2 



4Q (step 1310). The IRR filter sets or resets the flag F according 
to whether the value of AS (n) is larger than or smaller than 
the threshold T,^^ (step 1320). Note from FIG. 12 that as 
long as F=l, the IRR filter will be called. This places a tight 
control on ABR traffic. Note in FIG. 12 that the error signal 

^5 D(n-l) is updated and stored even though it is not used in 
the IRR filter 

FIG. 14 shows the operation of the high gain filter. The 
main control equation is: 



50 



£/?(n+l)-£/?(n)-aof>(n)/N5^,(rt)-aiD(n-iyMJ„fc,(n-l) 



where NS^t^n) is an estimate of the sum of all active ABR 
VCs corresponding to the given bottleneck, weighted by the 
ABR class weight W^^^. The values of the filler coefficients 
are the same as in the local DRC filter, i.e., 0^=1.2, aj=- 
55 0.75. For the high gain filter, the filter coefficients are scaled 

The routine proceeds as follows. In step 1400 the differ- 
ence between the actual and target queue length is deter- 
mined. In step 1410 the high gain filter is applied using the 

60 difference calculated in step 1400. In step 1420 D(n) is 
replaced by D(n-l) in preparation for the next iteration. In 
step 1430, taking the max{ER(n+l), 0) ensures that ER(n+l) 
is not negative; while taking min{(ER(n+l) E(o)} ensures 
that ER(n+l) is not longer than the local DRC rate E(n) In 

65 step 1440 all the ER values are shifter in time. 

In the low gain filter (see FIG. 15), the control equation 

is: 
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-p,oEWC»-10). 

Note that the coefiBckots are not scaled by NS„^^ for the low 
gain filter. The coefficient values for the low gain filter arc 
given in Table 3. 

TABLE 3 

Coefficient values for low gain ABR filter. 



CDcfficient 



Value 



P8 



0.0627 
-0.0545 
0.8S64 
0.0955 
0.0545 
0.0136 
-0.0273 
-0.0682 
-0.1091 
-0.1500 
-0.1909 
-0,2318 
-0.2727 
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The routine for the low-gain filter is identical 30 to that of 
the high-gain filter, except that the gain equation is different, 25 
Therefore, the explanation of the routine of FIG. 15 is 
omitted herein. 

6.3 Transmission of Control Information 

All DRC rates and ABR ER rate computations are per- 
formed at the respective OMs. During each scanning 3Q 
interval, each IM sends queue length information to all the 
OMs. This information is transmitted via special control 
cells called internal resource management (IRM) cells. 
These cells are generated by the IMs and constitute control 
signaUng overtiead. 35 

Based on the queue length information, each OM j 
computes a DRC rate, Ey, for local control, and an explicit 
rate (ER), ERy, for ABR source control. The ABR ER value ^nd 
is transmitted to the remote ABR source via resource man- 
agement (RM) cells traveling towards the source. ^ 
Analogously, IRM cells generated by the OM are used to 
carry the DRC rate information to the IMs. 

7. Buffer Management 
Operating in conjunction with the scheduler, each IM and 

OM contains a queue manager which is responsible for 45 
buffer allocation. In the large switch architecture according 
to the preferred embodiment, the OM buffers handle con- 
gestion arising from contention at output line bottlenecks, 
while the IM buffers handle congestion arising from con- 
tention at the OP bottlenecks. The queue managers in the I M 
and OM are independent but have similar architectures. The 
cell buffers in the IM and OM are shared among all queues, 
with limits on maximimi queue size. 

Each queue has pre-assigned cell discard thresholds based 
on the U-affic class and QoS requirement. The discard thresh- 55 
olds are listed as follows in increasing order of size: 

Drop CLP=1 cells. 

Early Packet Discard (EPD). Drop cells belonging to the 
new packet. 

Partial Packet Discard (PPD). Drop all cells, 60 
The queue manager drops CLP-1 cells in any queue flow 
which is being shaped to the minimum guaranteed rate. In 
this way, CLP=0 traffic receives the minimum guaranteed 
rate. 

8. Performance Evaluation 65 
The primary goal of DRC scheduling is to match bottle- 
neck rates in the switch so as to meet the twin objectives of 



preventing congestion while maintaining high efficiency. In 
addition, it distributes the unused bandwidth in a fair manner 
between competing classes. In this section, we present some 
representative simulation results to highlight the main per- 
formance characteristics of DRC scheduling in the switch 
design. 

8.1 Convergence of Rate Control 

Consider a switch loaded with two flows destined to the 
same output port OP 1 on the core switch module (see FIG. 
16): 

1. A CBR flow on IM 1 with constant input rate 0.58 and 
minimum guaranteed rate Mj=0.6. 

2. A UBR flow on IM 2 with constant input rate 0.9 and 
minimum guaranteed rate M2=0.3. 

Tlie UBR flow is misbehaving with respect to its minimum 
guaranteed rate. This may occur, since UBR sources are not 
policed at the network edge. In contrast, the CBR source is 
actually transmitting at less than its minimum guaranteed 
rate. 

The DRC rate for flow i is computed as R— M,+E, where 
E is the available unused bandwidth computed via closed- 
loop control. At time 0, the system is empty, so initially E-1. 
Thus, when the two flows are turned on simultaneously at 
time 0, each flow can initially transmit at the line rate; i.e., 
RX0*)=1, i=l,2. At time t=0*, the aggregate flow rate to OP 
1 is 1.48. Hence, the buffer at OP 1 builds and the global 
virtual queue forms at the input modules. The DRC mecha- 
nism reacts by decreasing the DRC rate E. 

FIG, 17 shows a plot of the flow rate R,- (t) Observe that 
the rates converge relatively quickly (in about 6 ms) to 
steady-state values. The CBR flow uses a bandwidth of 0.58. 
The UBR flow supplies IM 2 at the rate 0.9, but is guaran- 
teed a throughput of only 0.3. Hence, the correct value for 
the DRC rate E, is 0.12. Hence, the rates converge as: 



/fi(/)^0.72 



Note that although the CBR flow is permitted to transmit to 
OP 1 at rate 0.72, it enters IM 1 at the rate 0.58. On the other 
hand, the UBR flow is shaped at IM 2 to the rate 0.42. UBR 
cells will be dropped at the IM after the cell buffer capacity 
is exceeded. 

8.2 Real-time vs. Non-real- time Delay Performance 

To examine delay performance, we modify the above 
example by replacing flow 2 with a UBR flow having 
random on-off periods. The on and off periods are exponen- 
tially distributed with means 8 and 12 [cell times], respec- 
tively. During an on period, flow 2 transmits cells to IM 2 at 
a constant rate 0.93. Hence, the mean rate of low 2 is 0.372. 
Also, we shall set M2=0.38. The delay metrics obtained from 
the simulation run arc shown in Table 4. The mean delay is 
given in units of cell times, together with the corresponding 
98% confidence intervals. Observe that the CBR flow expe- 
riences very litUe delay and delay jitter. 

TABLE 4 

Delay performance with CBR flow 1 and UBR 
flow 2. „ 



Delay Metric [cell times] CBR flow ] 



UBR flow 2 



mean delay 
sid. dev. delay 



0.90 T 0.066 
0.03 



2.54 X 0.71 
9.09 
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TABLE 4-coatiaued 



Delay perfonnancc 


with CBR Row 1 


and UBR 




flow 2. 




Delay Metric [cell times] 


CBR flow 1 


UBR flow 2 


mean mleidepartuie 


1.51 * 0.11 


1.61 ± 0.11 


3tA dev. intcrdcparturc 


0.67 


4.28 



Now consider the case when flow 1 is changed frona the 
CBR class to the UBR class. As a UBR flow, flow 1 will be 
buffered in the non-realtime buffer of OP 1, along with flow 
2. The simulation results for this example arc given in Table 
5. Observe that the mean delays of both flows in Table 5 
have increased with respect to the corresponding delays in 
Tkble 4. Note in particular, that all of the delay metrics of 
flow 1 as a CBR flow are markedly better than the corre- 
sponding metrics as a UBR flow. This example serves to 
demonstrate the more stringent QoS control which the 
switch architecmre provides for real-lime traffic over non- 
real-time traffic. 



TABLE 5 



Delay performance 


with UBR flow 1 and UBR 




flow 2. 




Delay Metric [cell times] 


UBB flaw 1 


UBR flow 2 


mean delay 


1-79 £ 0.42 


1.59 ± 0.22 


Etd. dev. delay 


5.0 


3.92 


mean interdcparturc 


1.5 r 0.11 


1.62 X 0.12 


std. dev. inlerdepaiture 


1.03 


4.26 



8.3 DRC vs. Static Priority Scheduling 

FIG. 18 shows a switch loaded with three on-off type 
flows. The specification of the three flows is given in Table 
6. Each flow is associated with a traffic class, mean on and 
off periods, a rate during the on period, a source input 
module, a destination output module and a minimum guar- 
anteed rate (for DRQ 



TABLE 6 







Spcdficotion of three flows. 








Flow 






Mean 


On 








No. 


Class 


Mean On 


Off 


Rate 


[M 


OP 


Mi 


1 


Rt-VBR 


12 


8 


0.9 


1 


1 


0.65 


2 


Nrt-VBR 


8 


12 


0.93 


1 


2 


03 


3 


Rt-VBR 


7 


13 


0.93 


2 


1 


033 



Flows 1 and 3 are real-time VBR flows, while flow 2 is a 
non-real-time VBR flow. Further, flows 1 and 2 compete for 
service at IM 1. Id this example, we are interested in 
comparing DRC scheduling with static priority scheduling at 
IM 1. Static priority gives strict priority to real-time VBR 
flow 1 over non-reaJ-time VBR flow 2. Clearly, flow 1 will 
achieve the best delay performance under this scheme. 
However, this may have an adverse effect on flow 2. DRC 
scheduling provides a compromise by providing rate guar- 
antees to both flows. 

The delay results for DRC and static priority scheduling 
are shown, respectively, in Tables 7 and 8. Observe that 
under static priority, flow 1 experiences small delay. 
However, the delay of flow 2 is relatively large. Under DRC, 
the delay performance of flow 1 is compromised to a small 
extent, while the delay performance of flow 2 is improved 
significantly. 



TABLE? 
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Delav result under DRC scheduIinR. 




Delay Metric 
[cell times] 


Row 1 


Flow 2 


Row 2 


mean delay 
std. dev. delay 
mean interdeparture 
std. dev. 
intcidcpartuic 


2S.0 ± 3.68 

31.24 
1.57 * 0.11 
1.02 


25.6 ± 3.62 
31.82 

X24 4:0.14 

2.00 


345 ± 5.53 

63.40 
2.11 ± 0.16 
3.14 


TABLES 


Delav result under static priority scheduling. 


Delay Metric 
[celt times] 


Row 1 


Row 2 


Row 2 


mean delay 
std. dev. delay 
mean interdeparture 
std. dev. 
interdeparture 


10.5 ± 1.81 

26.48 
1.56 ± 0.11 
2.09 


121.5 ± 17.4 
104.3 
1.92 * 0.14 
3.68 


3.25 n 0.33 

7.72 
157 t 0.11 

3.94 



As can be seen from the above description, the inventive 
switch efficiently serves ceU streams having different QoS 
requirements. Additionally, the inventive switch efficiendy 
muhiplexes unicast and multicast transmission, using an 
efficient priority scheme. Using buffers at the input, the core, 
and the output of the switch allows work conservation 
without exacerbation of loaded bottlenecks. Additionally, a 
shape feedback signal is used to temporarily stop work 
conservation in order to alleviate temporary congestion 
while ensuring the guaranteed minimum rate. 

What is claimed is: 

1- An ATM switch capable of supporting streams of 
different classes having various quality of service 
requirements, comprising: 
^ a core switch comprising: 
a TDM bus; 

a plurality of input ports connected to said TDM bus; 
a plurality of output buffers connected to said TDM 
bus; 

a plurality of output ports connected to respective 

output buffers; and 
a multicast output buffer connected to each of said 

output ports; 

a plurality of input modules connected to input side of 
5Q said core switch, each of said input modules compris- 
ing: 

a plurality of output port planes corresponding to the 
number of said output ports, each of said output port 
planes having a plurality of input buffers; 
55 an input module scheduler for scheduling cells in said 
input buffers; 

a plurality of output modules connected to output side 
of said core switch, each of said output modules 
comprising: 

60 a plurality of output line planes, each having a plurality 
of output line buffers coupled to an output line; 
an output module scheduler for scheduling cells in said 
output buffers. 
2. The ATM switch of claim 1, wherein each of said 
65 output buffers of said core switch comprises overflow con- 
trol for generating a shape signal when level of cells in a 
respective output buffer reaches a first threshold, and gen- 
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crating a slop signal when the level of cells in the respective 
output buflfer reaches a second threshold. 

3. The ACTM switch of claim 1, further comprising: 
connection admission control for issuing queue times- 
tamps for each cell in said plurality of input buffers; 

a timer issuing a current time; and, 

wherein said input module scheduler further comprises a 
comparator for comparing said queue timestamps to 
said current lime, acd scheduling eligibility for service 
each of said cells having queue timcstamp equal to 
current time. 

4. The ATM switch of claim 3. wherein each of said input 
module scheduler and output module scheduler further com- 
prises: 

a timestamp storage unit for storing the queue timestamps 
and providing said queue timestamps to said compara- 
tor; 

a virtual rate shaping unit connected to said comparator 
for shaping transmission rate of cells eligible for ser- 
vice; 

a plurality of virtual queue counters for counting the 

number of cells eligible for service; 
a service scheduling unit for scheduling cells eligible for 

service; 

a compute engine for dynamically updating the times- 
tamps. 

5. The ATM switch of claim 4, further comprising a rate 
feedback from said output modules to said input modules. 

6. The ATM swilch of claim 1, wherein said plurality of 
output buffers of said core switch are designated real-time 
output buffers and wherein said core switch further com- 
prises: 

a plurality of non-real-time output buffers corresponding 
to the number of real-time output buffers and each 
connected to one of said output ports; 

a non-real-time multicast buffer connected to each of said 
output ports. 

7. An ATM switch capable of supporting cell streams of 
different classes having various quality of service 
requirements, comprising: 

a core switch comprising: 
a TDM bus; 

a plurality of input ports connected to said TDM bus; 
a plurality of output buffers connected to said TDM 
bus; 

a plurality of output ports connected to respective 

output buffers; and 
a multicast output buffer connected to each of said 

output ports; 

a plurality of output modules connected to output side 
of said core switch, each of said output modules 
having a plurality of output lines and comprising: 
a plurality of output line planes, each having a plurality 
of output line buffers coupled to said output lities; 
an output module scheduler for scheduling cells io said 

output buffers; 
a plurality of input modules connected to input side of 
said core switch, each of said input modules compris- 
ing: 

a plurality of output port planes corresponding to the 
number of said output ports, each of said output port 
planes having a plurality of output line planes cor- 
responding to the number of said output lines, each 
of said output line planes having a plurality of input 
buffers; 
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an input module scheduler for scheduling cells in said 
input buffers. 

8. The ATM switch of claim 7, wherein each of said 
output buffers of said core switch comprises overflow con- 

5 trol for generating a shape signal when level of cells in a 
respective output buffer reaches a first threshold, and gen- 
erating a stop signal when the level of cells in the respective 
output buffer reaches a second threshold. 

9. The ATM switch of claim 7, further comprising: 

10 connection admission control for issuing queue times- 
tamps for each cell in said plurality of input buffers; 

a timer issuing a current time; and, 

wherein said input module scheduler further comprises a 
comparator for comparing said queue timestamps to 
said current time, and scheduling eligibility for service 
each of said cells having queue timestamp equal to 
current time. 

10. The ATM switch of claim 9, wherein each of said 
2Q input module scheduler and output module scheduler further 

comprises: 

a timestamp storage unit for storing the queue timestamps 
and providing said queue timestamps to said compara- 
tor; 

25 a virtual rate shaping unit connected to said comparator 
for shaping transmission rate of cells eligible for ser- 
vice; 

a plurality of virtual queue counters for counting the 

number of cells eligible for service; 
a service scheduling unit for scheduling cells eligible for 

service; 

a compute engine for dynamically updating the times- 
tamps. 

11 . The ATM switch of claim 10, further comprising a rate 
feedback from said output modules to said input modules. 

12. The ATM switch of claim 7, wherein said plurality of 
output buffers of said core switch are designated real-time 
output buffers and wherein said core switch further com- 
prises: 

^ a plurality of non-real-time output buffers corresponding 
to the number of real-time output buffers and each 
connected to one of said output ports; 
a no n- real-time multicast buffer connected to each of said 

^5 output ports. 

13. An ATM switch capable of supporting streams of 
different classes having various quality of service 
requirements, comprising: 

a core switch comprising: 
50 a TDM bus; 

a plurality of input ports connected to said TDM bus; 
a plurality of output buffers connected to said TDM 
bus; 

a plurality of output ports connected to respective 
55 output buffers; 

a plurality of input modules connected to input side of 
said core switch, each of said input modules compris- 
ing: 

a plurality of input buffers; 
60 a connection admission control assigning a mini- 

mum guaranteed rate and an excess share weight 
for each of said input buffers; 
an input module scheduler for scheduling cells in said 
input buffers according to a rate composed of said 
65 minimum guaranteed rate and a share of the avail- 

able unused bandwidth, said share being propor- 
tional to the excess share weight. 
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14. The ATM switch of claim 13, further comprising a 
closed-loop controller for adjusting the available unused 
bandwidth according to cell occupancy of said output buflf- 
ers. 

15. The ATM switch of claim 14, further comprising an 5 
overload control for transmitting a shape rate signal from 
any of said output buffcis whenever cell occupancy of a 
respective one of said output buffers reaches a predeter- 
mined shape threshold, and wherein said scheduler is 
responsive to said shape signal to schedule said input buffers lO 
according to the guaranteed minimum rate only. 

16. A tripled buffered ATM switch capable of supporting 
streams of different classes having various quality of service 
requirements, comprising: 

a plurality of input buffers for queuing incoming ceU 
streams; 

a plurality of core buffers receiving queued cells of said 
cell streams from said input buffers and providing said 
queued cell to respective output ports; 

a plurality of output buffers receiving said queued cells 
from said output ports and providing said queued cell 
to respective output lines; 

a scheduler for scheduling transmission of said queued 
cells from said input buffers; 25 

a first feedback loop from said core buffers to said 
scheduler, for transmission of load information of said 
core buffers to said scheduler; 

a second feedback loop &X)m said output buffers to said 
scheduler, for transmission of load information of said 
output buffers to said scheduler; 

wherein said scheduler schedules transmission of said 
queued cells from said input module according to 
information received from said first and second feed- 
back loops. 

17. The tripled buffered ATM switch according to claim 

16, further comprising a connection admission control for 
assigning a minimum guaranteed rate to said cell streams, 
and wherein said schedxiler schedules said queued cells at 
the input buffers at a rate not less than said minimum ^ 
guaranteed rate. 

18. The tripled buffered ATM switch according to claim 

17, wherein said scheduler schedules said queueds at the 
input buffers according to a dynamic rate which comprises 
the minimum guaranteed rate plus a distribution of an 
unused bandwidth determined from the information of said 
first and second feedback loops. 

19. The tripled buffered ATM switch according to claim 

18, wherein said connection admission control further 
assigns queue limeslamps to each of said queues, and 
wherein said scheduler further comprises a comparator for 
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comparing the queue timestamps to current time and defin- 
ing each queue having a queue timestamp equal to or less 
than the current time as eligible for service. 

20. A scheduler for an ATM switch, comprising: 

a first memory for storing cells timestamps assigned to 
cells in queues to be scheduled; 

a second memory for storing actual queue load; 

a third memory for storing cells in virtual queues; 

a current time generator generating current time; 

a plurality of comparators for comparing the cells times- 
tamps to current time and designating cells having cell 
timestamps^ current time as eligible for service; 

a virtual rate selector for assigning cells from said eligible 
cells to said virtual queues; 

a service scheduling selector which selects queues for 
service from said virtual queues; 

a compute engine for scheduling and rescheduling cells in 
said first memory. 

21. A buffer, comprising: 

a first monitoring circuit for monitoring a load level in 
said buffer and for generating a shape signal when said 
load level reaches a first threshold, so as to cause input 
to said buffer to be reduced to a minimum level, and 
further for generating a stop signal when said load level 
reaches a second threshold, so as to halt any input to 
said buffer. 

22. The buffer of claim 21, further comprisiog an estima- 
tion circuit for estimating an unused bandwidth available on 
said buffer and generating a signal indicating said estimated 
bandwidth. 

23. A scheduler for an ATM switch, comprising: 

a first memory for storing queue timestamps assigned to 
queues to be scheduled; 

a second memory for storing actual queue load; 

a third memory for storing cells in virtual queues; 

a current time generator generating cmrent time; 

a plurality of comparators for comparing the queue times- 
tamps to current time and designating queues having 
cell timestamps^ current timestamp as eligible for ser- 
vice; 

a virtual rate selector for assigning cells from said eligible 

cells to said virtual queues; 
a service scheduling selector which selects queues for 

service from said virtual queues; 
a compute engine for scheduling and rescheduling cells in 

said first memory. 
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